智能论文笔记

A Theoretical View on Sparsely Activated Networks

Cenk Baykal , Nishanth Dikkala , Rina Panigrahy , Cyrus Rashtchian , Xin Wang

分类：机器学习 | (统计)机器学习

2022-08-08

深层神经网络如今成功地拟合了非常复杂的功能，但是对于推理而言，密集的模型开始非常昂贵。为了减轻这种情况，一个有希望的方向是激活网络稀疏子图的网络。该子图是由数据依赖性路由函数选择的，将输入的固定映射到子网（例如，专家（MOE）在开关变压器中的混合物）。但是，先前的工作在很大程度上是经验的，尽管现有的路由功能在实践中效果很好，但它们并没有导致近似能力的理论保证。我们旨在为稀疏网络的力量提供理论解释。作为我们的第一个贡献，我们提出了一个与数据相关的稀疏网络的形式模型，该网络捕获了流行体系结构的显着方面。然后，我们基于局部性敏感哈希（LSH）引入一个路由函数，使我们能够对稀疏网络近似目标函数的方式进行推论。在用我们的模型代表基于LSH的稀疏网络之后，我们证明稀疏网络可以匹配Lipschitz函数上密集网络的近似能力。在输入向量上应用LSH意味着专家在输入空间的不同子区域中插值目标函数。为了支持我们的理论，我们根据Lipschitz的目标功能定义了各种数据集，并且我们表明，稀疏网络在活动数量数量和近似质量之间具有良好的权衡。

translated by 谷歌翻译

Compositional Learning of Dynamical System Models Using Port-Hamiltonian Neural Networks

Cyrus Neary , Ufuk Topcu

分类：机器学习 | 人工智能

2022-12-01

Many dynamical systems -- from robots interacting with their surroundings to large-scale multiphysics systems -- involve a number of interacting subsystems. Toward the objective of learning composite models of such systems from data, we present i) a framework for compositional neural networks, ii) algorithms to train these models, iii) a method to compose the learned models, iv) theoretical results that bound the error of the resulting composite models, and v) a method to learn the composition itself, when it is not known a prior. The end result is a modular approach to learning: neural network submodels are trained on trajectory data generated by relatively simple subsystems, and the dynamics of more complex composite systems are then predicted without requiring additional data generated by the composite systems themselves. We achieve this compositionality by representing the system of interest, as well as each of its subsystems, as a port-Hamiltonian neural network (PHNN) -- a class of neural ordinary differential equations that uses the port-Hamiltonian systems formulation as inductive bias. We compose collections of PHNNs by using the system's physics-informed interconnection structure, which may be known a priori, or may itself be learned from data. We demonstrate the novel capabilities of the proposed framework through numerical examples involving interacting spring-mass-damper systems. Models of these systems, which include nonlinear energy dissipation and control inputs, are learned independently. Accurate compositions are learned using an amount of training data that is negligible in comparison with that required to train a new model from scratch. Finally, we observe that the composite PHNNs enjoy properties of port-Hamiltonian systems, such as cyclo-passivity -- a property that is useful for control purposes.

translated by 谷歌翻译

Geometric Learning of Hidden Markov Models via a Method of Moments Algorithm

Berlin Chen , Cyrus Mostajeran , Salem Said

分类：机器学习

2022-07-02

我们提出了一种新的算法，用于在几何环境中学习隐藏的马尔可夫模型（HMM）的参数，其中观测值在Riemannian歧管中采用值。特别是，我们提升了一种瞬间算法的二阶方法，该方法将非统一的相关性纳入了更通用的环境，在该环境中，观察结果在非阳性面力的Riemannian对称空间中进行，观察可能性是Riemannian Gaussians。所得算法将其分解为Riemannian高斯混合模型估计算法，然后是一系列凸优化程序。我们通过示例证明，与现有学习者相比，学习者可以显着提高速度和数值准确性。

translated by 谷歌翻译

Differentiable Time-Frequency Scattering on GPU

John Muradeli , Cyrus Vahidi , Changhong Wang , Han Han , Vincent Lostanlen , Mathieu Lagrange , George Fazekas

分类：机器学习

2022-04-18

联合时频散射（JTFS）是时频域中的卷积算子，以各种速率和尺度提取光谱调制。它提供了原发性听觉皮层中光谱接收场（STRF）的理想化模型，因此可以作为孤立音频事件规模的人类感知判断的生物学合理替代物。然而，JTFS和STRF的先前实现仍然不在音频生成的知觉相似性度量和评估方法的标准工具包中。我们将此问题追溯到三个局限性：不同的性能，速度和灵活性。在本文中，我们提出了Python中时间频率散射的实现。与先前的实现不同，我们的将Numpy，Pytorch和Tensorflow作为后端可容纳，因此可以在CPU和GPU上移植。我们通过三个应用说明了JTF的有用性：光谱调制的无监督流形学习，乐器的监督分类以及生物声音的质地重新合成。

translated by 谷歌翻译

Riemannian statistics meets random matrix theory: towards learning from high-dimensional covariance matrices

Salem Said , Simon Heuveline , Cyrus Mostajeran

分类： (统计)机器学习

2022-03-01

Riemannian Gaussian distributions were initially introduced as basic building blocks for learning models which aim to capture the intrinsic structure of statistical populations of positive-definite matrices (here called covariance matrices). While the potential applications of such models have attracted significant attention, a major obstacle still stands in the way of these applications: there seems to exist no practical method of computing the normalising factors associated with Riemannian Gaussian distributions on spaces of high-dimensional covariance matrices. The present paper shows that this missing method comes from an unexpected new connection with random matrix theory. Its main contribution is to prove that Riemannian Gaussian distributions of real, complex, or quaternion covariance matrices are equivalent to orthogonal, unitary, or symplectic log-normal matrix ensembles. This equivalence yields a highly efficient approximation of the normalising factors, in terms of a rather simple analytic expression. The error due to this approximation decreases like the inverse square of dimension. Numerical experiments are conducted which demonstrate how this new approximation can unlock the difficulties which have impeded applications to real-world datasets of high-dimensional covariance matrices. The paper then turns to Riemannian Gaussian distributions of block-Toeplitz covariance matrices. These are equivalent to yet another kind of random matrix ensembles, here called "acosh-normal" ensembles. Orthogonal and unitary "acosh-normal" ensembles correspond to the cases of block-Toeplitz with Toeplitz blocks, and block-Toeplitz (with general blocks) covariance matrices, respectively.

translated by 谷歌翻译

Taylor-Lagrange Neural Ordinary Differential Equations: Toward Fast Training and Evaluation of Neural ODEs

Franck Djeumou , Cyrus Neary , Eric Goubault , Sylvie Putot , Ufuk Topcu

分类：机器学习

2022-01-14

Neural ordinary differential equations (NODEs) -- parametrizations of differential equations using neural networks -- have shown tremendous promise in learning models of unknown continuous-time dynamical systems from data. However, every forward evaluation of a NODE requires numerical integration of the neural network used to capture the system dynamics, making their training prohibitively expensive. Existing works rely on off-the-shelf adaptive step-size numerical integration schemes, which often require an excessive number of evaluations of the underlying dynamics network to obtain sufficient accuracy for training. By contrast, we accelerate the evaluation and the training of NODEs by proposing a data-driven approach to their numerical integration. The proposed Taylor-Lagrange NODEs (TL-NODEs) use a fixed-order Taylor expansion for numerical integration, while also learning to estimate the expansion's approximation error. As a result, the proposed approach achieves the same accuracy as adaptive step-size schemes while employing only low-order Taylor expansions, thus greatly reducing the computational cost necessary to integrate the NODE. A suite of numerical experiments, including modeling dynamical systems, image classification, and density estimation, demonstrate that TL-NODEs can be trained more than an order of magnitude faster than state-of-the-art approaches, without any loss in performance.

translated by 谷歌翻译

Fast Doubly-Adaptive MCMC to Estimate the Gibbs Partition Function with Weak Mixing Time Bounds

Shahrzad Haddadan , Yue Zhuang , Cyrus Cousins , Eli Upfal

分类： (统计)机器学习

2021-11-14

我们提出了一种降低概率图形模型中普遍存在的吉布斯（Boltzmann）分布的分区功能（标准化常数）的计算复杂性的新方法。 Gibbs分布的实际应用的主要障碍是需要估计其分区功能。在解决该问题的情况下，本领域的状态是多级算法，其包括冷却时间表，以及时间表的每个步骤中的平均估计器。虽然这些算法中的冷却时间表是自适应的，但平均估计计算使用MCMC作为黑盒以绘制近似样本。我们开发了一种双重自适应方法，将自适应冷却时间与自适应MCMC平均估计器相结合，其数量的马尔可夫链步骤动态地适应下面的链条。通过严格的理论分析，我们证明了我们的方法在几个因素中优于最新的技术算法：（1）我们方法的计算复杂性较小; （2）我们的方法对混合时间的松散界限敏感，这些算法中的固有组成部分; （3）我们方法获得的改进在高精度估计的最具挑战性方案中特别显着。我们展示了我们在经典因素图中运行的实验中的方法的优势，例如投票模型和ising模型。

translated by 谷歌翻译

Neural Networks with Physics-Informed Architectures and Constraints for Dynamical Systems Modeling

Franck Djeumou , Cyrus Neary , Eric Goubault , Sylvie Putot , Ufuk Topcu

分类：机器学习 | 机器人

2021-09-14

Effective inclusion of physics-based knowledge into deep neural network models of dynamical systems can greatly improve data efficiency and generalization. Such a-priori knowledge might arise from physical principles (e.g., conservation laws) or from the system's design (e.g., the Jacobian matrix of a robot), even if large portions of the system dynamics remain unknown. We develop a framework to learn dynamics models from trajectory data while incorporating a-priori system knowledge as inductive bias. More specifically, the proposed framework uses physics-based side information to inform the structure of the neural network itself, and to place constraints on the values of the outputs and the internal states of the model. It represents the system's vector field as a composition of known and unknown functions, the latter of which are parametrized by neural networks. The physics-informed constraints are enforced via the augmented Lagrangian method during the model's training. We experimentally demonstrate the benefits of the proposed approach on a variety of dynamical systems -- including a benchmark suite of robotics environments featuring large state spaces, non-linear dynamics, external forces, contact forces, and control inputs. By exploiting a-priori system knowledge during training, the proposed approach learns to predict the system dynamics two orders of magnitude more accurately than a baseline approach that does not include prior knowledge, given the same training dataset.

translated by 谷歌翻译

Online learning of Riemannian hidden Markov models in homogeneous Hadamard spaces

Quinten Tupker , Salem Said , Cyrus Mostajeran

分类：机器学习

2021-02-15

隐藏的马尔可夫模型在欧几里得空间中具有观测值在信号和图像处理中起着重要作用。以前的工作扩展到基于鲍姆·韦尔奇（Baum-Welch）算法的riemannian歧管中观察结果的模型，遭受了高记忆使用和缓慢的速度。在这里，我们提出了一种在线，更准确的算法，并提供了速度和效率的显着提高。

translated by 谷歌翻译

Achieving Fairness via Post-Processing in Web-Scale Recommender Systems

Preetam Nandy , Cyrus Diciccio , Divya Venugopalan , Heloise Logan , Kinjal Basu , Noureddine El Karoui

分类： (统计)机器学习 | 机器学习

2020-06-19

建立公平的推荐系统是一个具有挑战性且至关重要的研究领域，因为它对社会产生了巨大影响。我们将两个普遍公认的公平概念的定义扩展到了推荐系统，即机会平等和均衡的赔率。这些公平措施确保同样对待“合格”（或“不合格”）候选人，无论其受保护的属性状况如何（例如性别或种族）。我们提出了可扩展的方法，以实现机会平等和在存在位置偏见的情况下排名均等的几率，这通常会困扰推荐系统产生的数据。我们的算法是模型不可知论，因为它们仅依赖于模型提供的最终分数，因此很容易适用于几乎所有Web尺度推荐系统。我们进行广泛的模拟以及现实世界实验，以显示我们方法的功效。

translated by 谷歌翻译